Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Generative large language models (LLMs) exhibit impressive capabilities, which can be further augmented by integrating a pre-trained vision model into the original LLM to create a multimodal LLM (MLLM). However, this integration often significantly decreases performance on natural language understanding and generation tasks, compared to the original LLM. This study investigates this issue using the LLaVA MLLM, treating the integration as a continual learning problem. We evaluate five continual learning methods to mitigate forgetting and identify a technique that enhances visual understanding while minimizing linguistic performance loss. Our approach reduces linguistic performance degradation by up to 15% over the LLaVA recipe, while maintaining high multimodal accuracy. We also demonstrate the robustness of our method through continual learning on a sequence of vision-language tasks, effectively preserving linguistic skills while acquiring new multimodal capabilities.more » « lessFree, publicly-accessible full text available August 11, 2026
-
Free, publicly-accessible full text available July 27, 2026
-
Smith-Renner, A.; Taele, P. (Ed.)Recruiting older adults for research studies is a challenging endeavor. We conducted an interview to understand older adults’ preferences and expectations, with the goal of building a recommender system to support the selection of suitable research studies. Our findings suggest that sharing the results of the studies they participated in would motivate older adults to participate in more studies and give them a feeling of self-accomplishment and belonging. We list 15 design implications based on our user research and present a prototype system based on these design implications.more » « less
-
This paper presents JEDAI Explains Decision-Making AI (JEDAI), an AI system designed for outreach and educational efforts aimed at non-AI experts. JEDAI features a novel synthesis of research ideas from integrated task and motion planning and explainable AI. JEDAI helps users create high-level, intuitive plans while ensuring that they will be executable by the robot. It also provides users customized explanations about errors and helps improve their understanding of AI planning as well as the limits and capabilities of the underlying robot system.more » « less
-
McCulloch, R. (Ed.)Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, we aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.more » « less
-
null (Ed.)Linear mixed-effects models play a fundamental role in statistical methodology. A variety of Markov chain Monte Carlo (MCMC) algorithms exist for fitting these models, but they are inefficient in massive data settings because every iteration of any such MCMC algorithm passes through the full data. Many divide-and-conquer methods have been proposed to solve this problem, but they lack theoretical guarantees, impose restrictive assumptions, or have complex computational algorithms. Our focus is one such method called the Wasserstein Posterior (WASP), which has become popular due to its optimal theoretical properties under general assumptions. Unfortunately, practical implementation of the WASP either requires solving a complex linear program or is limited to one-dimensional parameters. The former method is inefficient and the latter method fails to capture the joint posterior dependence structure of multivariate parameters. We develop a new algorithm for computing the WASP of multivariate parameters that is easy to implement and is useful for computing the WASP in any model where the posterior distribution of parameter belongs to a location-scatter family of probability measures. The algorithm is introduced for linear mixed-effects models with both implementation details and theoretical properties. Our algorithm outperforms the current state-of-the-art method in inference on the functions of the covariance matrix of the random effects across diverse numerical comparisons.more » « less
-
null (Ed.)The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to determine whether any given learning rule is operative in a real biological system. In this work, we take a "virtual experimental" approach to this problem. Simulating idealized neuroscience experiments with artificial neural networks, we generate a large-scale dataset of learning trajectories of aggregate statistics measured in a variety of neural network architectures, loss functions, learning rule hyperparameters, and parameter initializations. We then take a discriminative approach, training linear and simple non-linear classifiers to identify learning rules from features based on these observables. We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula. We identify the statistics of each observable that are most relevant for rule identification, finding that statistics from network activities across training are more robust to unit undersampling and measurement noise than those obtained from the synaptic strengths. Our results suggest that activation patterns, available from electrophysiological recordings of post-synaptic activities on the order of several hundred units, frequently measured at wider intervals over the course of learning, may provide a good basis on which to identify learning rules.more » « less
An official website of the United States government

Full Text Available